WIP: Port the init binary code to Rust#670
Conversation
There was a problem hiding this comment.
Maybe this would also be a good opportunity to move this build.rs away from devices crate too.
Not sure what should it be called, maybe init-blob? I'm thinking it should literally be a crate that has 1 public constant (which is the init binary) and this build.rs.
For now devices crate can depend on this init-blob as usual, but I plan to change that. I may end up stacking multiple PRs on top of this which (depending on how long it will take to merge this), which need this to be a separate crate1, so it would really simplify the rebases for me.
Footnotes
-
I want to make the VMM crate depend on this
init-bloband not the fs device itself (fs device will just receive a list of virtual files in constructor) this is in preparation for the 2.0 Rust API. ↩
158d388 to
1719f2f
Compare
Move the init binary build script and include_bytes!() from the devices crate into a new init-blob crate. The passthrough modules reference the binary as init_blob::INIT_BINARY instead of using include_bytes! directly. build.rs based on code from containers#593. Co-authored-by: Geoffrey Goodman <geoff@goodman.dev> Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Replace the private next_inode AtomicU64 inside PassthroughFs with a shared InodeAllocator that is passed in at construction. This lets multiple layers (e.g. a future virtual-inode overlay) allocate from the same counter without implicit coordination via reserved ranges. PassthroughFs::new() and PassthroughFsRo::new() now take an Arc<InodeAllocator> parameter. FsWorker::new() creates the allocator and passes it through. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Introduce AugmentFs<T>, a generic overlay that wraps any FileSystem implementation and intercepts FUSE operations for virtual inodes — synthetic read-only files backed by static data. One-shot files can only be looked up once. The overlay uses the shared InodeAllocator to assign inode numbers, so virtual and passthrough inodes never collide. Remove all init.krun special-case code (init_inode, init_handle, INIT_CSTR, init_payload) from both the Linux and macOS passthrough implementations. The init.krun virtual file is now configured via VirtualEntry in the krun API layer and handled generically by the overlay. FsDeviceConfig carries a Vec<VirtualEntry> and FsWorker wraps AugmentFs<PassthroughFs> / AugmentFs<PassthroughFsRo>. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Add API to prevent the default init binary (/init.krun) from being
injected into the root filesystem. Follows the existing
krun_disable_implicit_{console,vsock} pattern.
Must be called before krun_set_root().
Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Add C API to inject arbitrary virtual files into a virtiofs device. The file appears in the root directory of the specified mount and is backed entirely by host memory. Supports one-shot semantics (the file can only be looked up once). The data pointer follows the same lifetime contract as other krun APIs: the caller must keep the memory valid until krun_start_enter() returns. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Add API to retrieve the built-in default init binary. Callers that use krun_disable_implicit_init() can use this to obtain the init binary and inject it themselves via krun_fs_add_overlay_file(). Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
NullFs implements the FileSystem trait with just an empty root directory. It can be wrapped with AugmentFs to serve virtual files without any host directory involvement. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
krun_set_root_disk_remount no longer creates a temporary empty host directory. Instead it configures a NullFs-backed virtiofs device (shared_dir: None) with init.krun overlaid via AugmentFs. Fs::new() now accepts Option<String> for shared_dir — None selects NullFs. FsDeviceConfig and FsServer gain the corresponding variants. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
The temporary root directory hack is gone (replaced by NullFs), so the ioctl that cleaned it up and the config flag that gated it are no longer needed. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
The exit-code ioctl is a krun mechanism, not a filesystem operation. Move it to the AugmentFs where it is handled before any delegation to the inner filesystem. The Linux passthrough retains only EXPORT_FD (which needs access to passthrough-internal handle and export tables). The macOS passthrough no longer implements ioctl at all (the trait default returns ENOSYS for any cmd that reaches it). Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Boot a VM with a pure NullFs root — no host directory at all. Every
file in the root (init.krun, guest-agent, .krun_config.json, test
data) is injected as a virtual overlay, and /dev, /proc, /sys are
virtual empty directories used as mount points.
The guest verifies:
- One-shot files (init.krun, guest-agent, .krun_config.json) are
gone after being consumed
- Persistent files (marker.txt, testdata.bin) survive and are
re-readable
- Write access to virtual files is denied (EACCES)
- stat reports correct sizes
- Range reads at various offsets return correct data
- Read past EOF returns zero bytes
Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Boot from an ext4 block device via krun_set_root_disk_remount. The virtiofs root uses NullFs with init.krun and virtual mount-point directories overlaid. The guest verifies it pivoted to the block device root successfully. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Replace the C-based build_default_init() in src/devices/build.rs with a Rust crate (init/) compiled via a cargo subprocess. The new build.rs probes whether the active rustc supports the x86_64-unknown-linux-musl target (for a static binary) and falls back to the native target with a user-visible warning if not. The KRUN_INIT_BINARY_PATH override mechanism is preserved so that out-of-tree binaries (e.g. pre-built SEV or TDX images) can still be injected without rebuilding. Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Add init/src/fs.rs with: - mount_once(): helper that treats EBUSY as success - mount_filesystems(): mounts devtmpfs, proc, sysfs, cgroup2, devpts, tmpfs(/dev/shm), and creates the /dev/fd symlink - is_mount_point(): parses /proc/mounts (avoids triggering Podman auto-mounts that stat() would cause) - mount_tmpfs(): mounts a tmpfs at an arbitrary path Implement mount_tee_block_root() function used by both SEV and TDX features to mount /dev/vda and chroot into it. For amd-sev this replaces the previous LUKS/KBS attestation path entirely. The SEV and TDX boot paths are now identical at the init level. Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Extend fs.rs with: - try_mount(): mounts with a known fstype, or probes /proc/filesystems when fstype is None - mount_block_root_device(): handles KRUN_BLOCK_ROOT_DEVICE by mounting the block device at /newroot, issuing KRUN_REMOVE_ROOT_DIR_IOCTL to drop the virtiofs temporary root, then pivoting with MS_MOVE - mount_shared_root(): sets MS_REC|MS_SHARED propagation on / Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Port init/dhcp.c to Rust in init/src/dhcp.rs. The public surface is a single do_dhcp(iface) function with the same behaviour as the C version: - Sends DHCPDISCOVER with Rapid Commit (option 80) - On DHCPACK: applies address, route, MTU, and DNS directly - On DHCPOFFER: completes the 4-way handshake, then applies - On no response: returns Ok (VM may be IPv6-only) Netlink structs not exposed by libc (ifinfomsg, ifaddrmsg, rtmsg) are defined locally with #[repr(C)]. sockaddr_nl and sockaddr_in are zero-initialised via mem::zeroed() to handle opaque padding fields. Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Add init/src/config.rs, replacing the hand-rolled jsmn-based parser with serde_json. Parses /.krun_config.json (or KRUN_CONFIG env var) and returns a Config struct with: - argv: Entrypoint ++ (args | Cmd), or None if absent - workdir: WorkingDir or Cwd - tmpfs: first tmpfs mount destination not already mounted Environment variables from the Env array are applied during parsing, with HOME and TERM always overwritten, all others set only if unset. A missing or unparseable config file is silently ignored. Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Add setup_network() and setup_dhcp() to env.rs. setup_network() brings up lo unconditionally. setup_dhcp() checks that the interface exists before calling do_dhcp(), and logs a warning on failure rather than aborting (DHCP failure is non-fatal — the VM may be IPv6-only or have no network). Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Extend env.rs with: - apply_hostname(): sets hostname from HOSTNAME env var, defaulting to "localhost" - apply_env(): maps KRUN_HOME -> HOME and KRUN_TERM -> TERM - apply_rlimits(): parses the KRUN_RLIMITS comma-separated list of id,cur,max triples and applies each via setrlimit(2) Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Add exec.rs with: - setup_redirects(): walks /sys/class/virtio-ports and dup2s krun-stdin/stdout/stderr onto the corresponding file descriptors - set_exit_code(): reports the workload exit code to the host via KRUN_EXIT_CODE_IOCTL, only when the root fs is virtiofs - run_workload(): forks so PID 1 can reap children; the child calls exec_workload() which sets up redirects and execvp's the argv. Parent waits for the child, reports exit code, syncs, and reboots. KRUN_INIT_PID1=1 skips the fork and exec_workload directly as PID 1. Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Connect all modules in main() in order: 1. mount_block_root() [amd-sev | tdx] 2. mount_filesystems() 3. mount_block_root_device() [KRUN_BLOCK_ROOT_DEVICE] 4. mount_shared_root() 5. setsid + TIOCSCTTY 6. setup_network() 7. config::load() 8. mount_tmpfs() [config tmpfs mount] 9. apply_env / apply_hostname / apply_rlimits 10. chdir to workdir 11. run_workload(argv) Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Add init/src/freebsd.rs with: - kenv_get(): reads a variable from the FreeBSD kernel environment via kenv(2), which is the source of env vars for init before the process environment is set up - populate_env_from_kenv(): imports the known KRUN_* variables from kenv into std::env at startup so the rest of the code can use std::env::var uniformly on both platforms - open_console(): replicates login_tty(3) without linking libutil — revokes existing opens of /dev/console, opens it, creates a new session via setsid(2), sets the controlling terminal via TIOCSCTTY, and dup2s it onto stdio; falls back to /dev/null + /init.log - mount_config_iso() / unmount_config_iso(): mounts the KRUN_CONFIG ISO 9660 image at /mnt via nmount(2) so the JSON config file can be read, then unmounts it afterwards Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Connect the FreeBSD helpers into the boot sequence:
- open_console() and populate_env_from_kenv() are called at the very
start of main() before anything else
- setsid/TIOCSCTTY are Linux-only; open_console() handles session setup
on FreeBSD
- setlogin("root") is called on FreeBSD after console setup
- KRUN_DHCP and DHCP setup are Linux-only
- If KRUN_CONFIG is not set, mount_config_iso() is attempted; the ISO
is unmounted immediately after config::load() returns
- fs::* mounts and mount_shared_root are Linux-only
- exec_workload() calls open_console() on FreeBSD instead of
setup_redirects(), giving the child process a fresh controlling
terminal before execvp
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Replace the C-based BSD init build rule (which referenced the now-deleted init/init.c) with a cargo build rule targeting the correct Rust triple. Makefile: - Remove dead INIT_SRC = init/init.c variable. - Derive FREEBSD_RUST_TARGET from the host ARCH with arm64→aarch64 substitution to get the correct Rust triple. - Set CARGO_BSD_RUSTFLAGS with the clang cross-linker flags (mirroring the existing CC_BSD setup) so cargo can link for FreeBSD. - aarch64-unknown-freebsd is a Tier 3 target with no prebuilt std; use +nightly -Z build-std for that case. setup-build-env: - Add rustup target add x86_64-unknown-freebsd (Tier 2, prebuilt std). - Install nightly toolchain + rust-src for the aarch64 FreeBSD case. cross-compilation.yml: - Add clang to the Linux cross-compilation dependencies so the FreeBSD linker flags resolve correctly on Linux runners. Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Implements the timesync feature behind the `timesync` cargo feature flag. Receives host-side nanosecond timestamps over AF_VSOCK/SOCK_DGRAM on port 123 and applies them via clock_settime when the delta exceeds 100ms. Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
Delete init/init.c, init/dhcp.c, init/dhcp.h, init/jsmn.h, and the entire init/tee/ directory (snp_attest.c/h and the KBS client). The amd-sev feature no longer performs LUKS unlock or KBS attestation — it mounts /dev/vda as ext4 like the tdx path does. Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me> Assisted-by: Claude Code:claude-sonnet-4.6
This PR ports the init binary code to Rust. It acts like any of the other crates that we have within the project.
To run the examples or with Podman, you would build the project as usual:
make BLK=1 NET=1 && sudo make BLK=1 NET=1 installand continue with business as usual.Fixes: #632